How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing

نویسندگان

  • Fabienne Fritzinger
  • Alexander Fraser
چکیده

Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-driven statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Avoid Burning Ducks: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing

Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We ...

متن کامل

ArchiMob - A Corpus of Spoken Swiss German

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety rarely recorded and that it is subject to considerable regional variation. This paper presents a freely available general-purpose corp...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Abstract Anaphors in German and English

Anaphors in German and English Stefanie Dipper, Christine Rieger, Melanie Seiss, and Heike Zinsmeister 1 Ruhr-University Bochum, 44780 Bochum, Germany 2 University of Konstanz, 78457 Konstanz, Germany Abstract. Abstract anaphors refer to abstract referents such as facts or events. Automatic resolution of this kind of anaphora still poses a problem for language processing systems. The present pa...

متن کامل

An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing

A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010